The Preposition Corpus in Sketch Engine

نویسنده

  • Ken Litkowski
چکیده

Corpora from the Pattern Dictionary of English Prepositions (PDEP) provide the basis for examining the behavior of 304 English prepositions, with 1040 senses (patterns) describing in 20 fields. The PDEP corpora comprise dependency parses for 81,509 sentences in CoNLL-X format, previously used in several studies, particularly used for disambiguation modeling. We have now put these parse data into Sketch Engine (SE), using its mechanisms for further perspectives of preposition behavior. In the process, we have also annotated each parse with additional information that provides an even richer set of data to examine preposition behavior. For each sentence, Sketch Engine identifies the sense number and the direct link location to the PDEP sense description; these references can be displayed for each concordance line. Sketch Engine data for each sentence also includes the PDEP class, the subclass, and supersense tags for the preposition complements and governors (i.e., semantic relations using the WordNet lexicographer file class). We describe in detail how the corpora were prepared for SE, involving several scripts used in PDEP to access its databases. Some of these scripts provide additional entry points into the PDEP data, particularly for the class and subclass. We describe the use of WordNet noun, verb, adjective, and adverb supersenses to tag the complements and governors, i.e., semantic word sketches for the prepositions. The resultant preposition data within SE provides a perspective different from its usual focus on the main parts of speech, so we describe the unique aspects enabled for the PDEP corpora in considerable detail. We describe several corpus query language (CQL) queries that provide useful perspectives on preposition behavior, particularly showing preposition collocations, (semantic) word sketches, preposition thesauruses, and preposition sketch differences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hindi Word Sketches

Word sketches are one-page automatic, corpus-based summaries of a word’s grammatical and collocational behaviour. These are widely used for studying a language and in lexicography. Sketch Engine is a leading corpus tool which takes as input a corpus and generates word sketches for the words of that language. It also generates a thesaurus and ‘sketch differences’, which specify similarities and ...

متن کامل

Using Chinese Gigaword Corpus and Chinese Word Sketch in linguistic Research

We explore the possibility of deeper linguistic research based on corpus and computational linguistic tools in this paper. In particular, we adopt Chinese Word Sketch, the application of Word Sketch Engine to Chinese GigaWord Corpus, for linguistic research. We apply Chinese Sketch Engine results to deeper linguistic account such as selectional restriction and event type selection. The study is...

متن کامل

European Union Language Resources in Sketch Engine

Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely new resource is introduced: EUR-Lex corpus, being one of the largest parallel corpus available at the moment, containing 840 million tokens of English and having the ...

متن کامل

Chinese Sketch Engine and the Extraction of Grammatical Collocations

This paper introduces a new technology for collocation extraction in Chinese. Sketch Engine (Kilgarriff et al., 2004) has proven to be a very effective tool for automatic description of lexical information, including collocation extraction, based on large-scale corpus. The original work of Sketch Engine was based on BNC. We extend Sketch Engine to Chinese based on Gigaword corpus from LDC. We d...

متن کامل

The Sketch Engine

Word sketches are one-page automatic, corpus-based summaries of a word’s grammatical and collocational behaviour. They were first used in the production of the Macmillan English Dictionary and were presented at Euralex 2002. At that point, they only existed for English. Now, we have developed the Sketch Engine, a corpus tool which takes as input a corpus of any language and a corresponding gram...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017